Data anaysis

Question 1. What are the most common crime types? Which borough had the most crime number?

Let's visualize it!

Let's find out which borough has the most number of crimes

---Q1 ANSWER: From the outcome we know that PETIT LARCENY is the most common crime type. And the top 5 crime types are PETIT LARCENY, HARRASSMENT 2, ASSAULT 3 & RELATED OFFENSES, CRIMINAL MISCHIEF & RELATED OF, and GRAND LARCENY.

Question 2. Did some of the efforts of the police system and the social education system work? What is the trend of crimes declined?

Let's visualize the trend

---Q2 ANSWER: The amount of crime is decreasing year by year! The efforts of the police system and the social education system works fine!

Question 3. What is the crime trend during a year? which month is the peak of crime?

First, let's count the number since 2012

We can conclude from the data and figures that July and August have the most number of crime. Criminals are more active in summer days.

Then, let's focus on the most recent year 2019 for further analysis.

The 2019 data shows the same trend as history

---Q3 ANSWER: Through out the year, the crime has a curve trend. The peak is Jul and Aug, which is the busiest time of the police system.

Use geopy library to get the latitude and longitude values of New York City.

import folium library to visualize the data on maps

Due to the large amount of data, let's focus on Aug of 2019 to get the visualization

We cannot visualize all the cases due to the large amount, so we can cluster them for better visualization.

Define Foursquare Credentials and Version

Search the whole city, get the top 100 venues.

Analize the results, get the location information.

在地图上显示venue

---Q4 ANSWER: The aera which has the most top 100 venues has the most concentrated crime. Therefore, crime has a direct relationship with economic prosperity.

Question 5. Can the crime trend of 2020 be predicted?

Two ways are used here to predict the number of 2020: 1. Use the historic data grouped by month,then draw a curve to fit the data to figure out the trend during a year. This curve is the average prediction of 2020. __2. Use the

From the trend figure we know that the trend is not linear throughout the year. So polynomial regression methodology is used.

Try the model with degree from 1 to 10 to find the most fit. Get the score and MSE for each degree.

Draw the fitted curve with different degrees.

Draw a figure to visualize the scores with different degrees.

Since that a high degree may cause overfitting, we select degree=6.

Let's get the prediction of 2020 with degree 6.

Next, we use only the recent 2 years data to build the model.

Also, try different degrees

It is more obvious from this figure that degree 6 is propriate.

Get the prediction with degree 6.

compare the 2 ways of prediction

Apparently, the second prediction is lower than the first one. This is due to the crime number decreasing year by year. Thus, it is believed that the second prediction is more precise.

---Q5 ANSWER: We can use polynomial regression to predict the year 2020. However, there is no actual data to verify the accuracy of the prediction. Therefore, this prediction only gives us a general idea.